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ABSTRACT 



In a distributed heterogeneous computer system having 
a plurality of computer nodes each operativcly con- 
nected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing Jobs to be performed, an improve- 
ment for dynamically reallocating the system's re- 
sources for optimized job performance. There is first 
logic at each node for dynamically and periodically 
calculating and saving a workload value as a function of 
the number of jobs on the node*s queue. Second logic is 
provided at each node for transfering the node's work- 
load value to other nodes on the network at the request 
of the other nodes. Finally, there is third logic at each 
node operable at the completion of each job. The third 
logic includes, logic for checking the node's own work- 
load value, logic for polling all the other nodes for their 
workload value if the checking node's workload value 
is below a preestablished value indicating the node as 
being underutilized and available to do more jobs, logic 
for checking the workload values of the other nodes as 
received, and logic for transfering a job from the queue 
of the other of the nodes having the highest workload 
value over a preestablished value indicating the other of 
the nodes as being overburdened and requiring job 
relief to the que of the checking node. The third logic is 
also operable periodically when the node is idle. 

16 Claims, 3 Drawing Sheets 
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DYNAMIC RESOURCE ALLOCATION SCHEME 
FOR DISTRIBU TEP H ETEROGENEOUS 
COMPUTER SYSTEMS 

5 

ORIGIN OF THE INVENTION 

The invention described herein was made in the per- 
formance of work under a NASA contract, and is sub- 
ject to the provisions of Public Law 96-517 (35 USC 
202) in which the Contractor has elected not to retain 
title. 

TECHNICAL FIELD 

The invention relates to resource allocation in com- 
puter systems and, more particularly, to a method and 
associated apparatus for shortening response time and 
improving efhciency of a heterogeneous distributed 
networked computer system by reallocating the jobs 
queued up for busy nodes to idle, or less-busy nodes. In 
accordance with a novel algorithm, the load-sharing is 
initiated by the server device in a manner such that 
extra overhead is not imposed on the system during 
heavily-loaded conditions. 

BACKGROUND ART 23 

In distributed networked computer systems there is a 
high probability that one of the workstations will be idle 
while others are overloaded. Thus, the response times 
for certain tasks are longer than they should be if all the 
capabilities in the system could be shared fully. As is ^0 
known in the art, the solution is to reallocate tasks from 
queues connected to busy computers to idle computer 
queues. 

As depicted in FIG. 1, a distributed computer system 
10 consists of several computers 12 with the same or 35 
different processing capabilities, connected together by 
a network 14. Each of the computers 12 has tasks 16 
assigned to it for execution. In such a distributed multi- 
computer system, the probability is high that one of the 
computers 12 is idle while another computer 12 has 40 
more than one task 16 waiting in the queue for service. 
This probability is called the "imbalance probability". 
A high imbalance probability typically implies poor 
system performance. By reallocating queued tasks or 
jobs to the idle or lightly-loaded computers 12, a reduc- 45 
tion in system response time can be expected. This tech- 
nique is called "load sharing" and is one of the main foci 
of this invention. As also depicted in FIG. 1, such redis- 
tribution of the tasks 16 on a dynamic basis is known in 
the art Typically, there is a control computer 18 at- 30 
tached to the network 14 containing task lists 20. On 
various bases, the control computer 18 dynamically 
reassigns tasks 16 from the lists 20 to various computers 
12 within the system 10. For example, it is known in the 
art to have each of the computers 12 provide the con- 35 
trol computer 18 with a indicator of the amount of 
computing time on tasks that is actually taking place. 
The control computer 18, with knowledge of the 
amount of use of each computer 12 available, is then 
able to reallocate the tasks 16 as necessary. In military 60 
computer systems, and the like, this ability to reconfig- 
ure, redistribute, and keep the system running is an 
important part of what is often referred to as "graceful 
degradation"; that is, the system 10 continues to operate 
as best it can to do the tasks at hand on a priority basis 65 
for as long as it can. 

The inventors herein did a considerable amount of 
statistical analysis and evaluation of networked com- 



,089 

2 

puter systems according to the known prior techniques 
for load distribution and redistribution. Their fmding 
will now be set forth by way of example to provide a 
clear picture of the background and basis for the present 
invention. 

The imbalance probability, IP, for a heterogeneous 
system can be calculated by mathematical techniques 
well known to those skilled in the art which, per se, are 
no part of the novelty of the present invention. There is 
a finite, calculatable probability that I out of N comput- 
ers comprising a networked system are idle. There is 
also a finite probability that all stations other than those 
I stations are busy, as well as a probability that there is 
exactly one job in each one of the remaining (N-I) sta- 
tions, Le. a finite probability that at least one out of (N-I) 
stations has one or more jobs waiting for service. By 
summing over the number of idle stations, from I to N, 
the imbalance probability for the whole system can be 
obtained. By way of example, in a homogeneous system, 
all the nodes (i.e. computers 12) have the same service 
rate and the same arrival rate. As the number of nodes 
increases, the peak of the imbalance probability goes 
higher. As the number of nodes increases to twenty, the 
imbalance probability approaches I when the trafftc 
intensity (arrival rate divided by the service rate at each 
node) ranges from 40% to 80%. The statistical curves 
also indicate that the probability of imbalance is high 
during moderate traffic intensity. This occurs due to the 
fact that all nodes are either idle (i.e. there is low traffic 
intensity) or are busy (i.e. there is high traffic intensity). 

If the arrival rate is not evenly distributed, the imbal- 
ance probability becomes even higher. In the imbalance 
probability of a two-node heterogeneous system, the 
faster node is twice as fast as the slower one and the 
work is evenly distributed. If the work is not balanced, 
it has been observed that the imbalance probability goes 
even higher during high traffic intensity at the slower 
node. At this point, the slower node is heavily loaded 
even though the faster node is only 50% utilized. 

Numerous studies have addressed the problem of 
resource-sharing in distributed systems. It is convenient 
to classify these strategies as being either static or dy- 
namic in nature and as having either a centralized or 
decentralized decision-making capability. One can fur- 
ther distinguish the algorithms by the type of node that 
takes the initiative in the resource-sharing. Algorithms 
can either be sender-initiated or server-initiated. Some 
algorithms can be adapted to a generalized heteroge- 
neous system while othen can only be used in a homo- 
geneous system. These categories are further explained 
as follows: 

Static/Dynamic: Static schemes use only the infor- 
mation about the long-term average behavior of the 
system, i.e. they ignore the current state. Dynamic 
schemes differ from static schemes by determining how 
and when to transfer jobs based on the time-dependent 
current system state instead of the average behavior of 
the system. The major drawback of static algorithms is 
that they do not respond to fluctuations of the work- 
load. Dynamic schemes attempt to correct this draw- 
back but are more difficult to implement and may intro- 
duce additional overhead. In addition, dynamic 
schemes are hard to analyze. 

Centralized/Decentralized: In a system with central- 
ized control, jobs are assumed to arrive at the central 
controller which is responsible for distributing the jobs 
among the network's nodes; in a decentralized system, 
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jobs are submitted to the individual nodes and the deci- 
sion to transfer a job to another node is made locally. 
This central dispatcher approach is quite restrictive for 
a distributed system. 

Homogeneous/Heterogeneous system: In the homo- 3 
geneous system, all the computer nodes are identical 
and have the same service rate. In the heterogeneous 
system, the computer nodes do not have the same pro- 
cessing power. 

Sender/Server Initiated: If the source node makes a 10 
determination as to where to route a job, this is dcfmed 
as a sender-initiated strategy. In server-initiated strate- 
gies, the situation is reversed, i.e., lightly-loaded nodes 
search for congested nodes firom which work may be 
transferred. 15 

The prior art as discussed in the literature (see Listing 
of Cited References hereinafter) will now be addressed 
with particularity. 

First, there are the sutic strategies. Stone [Ston 78] 
developed a centralized muximum How algorithm for 20 
two processors (i.e. computer nodes) by holding the 
load of one processor fixed and varying the load on the 
other processor. Ni and Hwang [Hwan 81] studied the 
problem of load balancing in a multiple heterogeneous 
processor system with many job classes. In this system, 23 
the number of processors was extended to more than 
two. Tantawi and Towsley [Tant 85] formulated the 
static resource-sharing problem as a nonlinear program- 
ming problem and presented two efficient algorithms, 
the parametric-study algorithm and the load-balancing 30 
problem. Silva and Geria [Silv 84] used a downhill 
queueing procedure to search for the static optimal job 
assignment in a heterogeneous system that supports 
multiple job classes and site constrains. Recently, 
Kurose and Singh [Kuro 86] used an iterative algorithm 33 
to deal with the static decentralized load-sharing prob- 
lem. Their algorithm was examined by theoretical and 
simulation techniques. 

Next, there are the dynamic strategies. Chow and 
Kohler [Chow 79] used a queueing theory approach to 40 
examine a resource-sharing algorithm for a heteroge- 
neous two-processor system with a central dispatcher. 
Their objective was to minimize the mean response 
time. Foschni and Salz [Fosc 79] generalized one of the 
methods developed by Chow and Kouler to include 43 
multiple job dispatchers. Wah [Wah 84] studied the 
communication overhead of a centralized resource- 
sharing scheme designed for a homogeneous system. 
Load-balancing of the Purdue ECN (Engineering Com- 
puter Network) was implemented with a dynamic de- 30 
centralized RXE (remote Execution environxnent) pro- 
gram [Hawn 82]. With the decentralized RXE, the load 
information of all the processors was maintained in each 
network machine's kernel. One of the problems with 
this approach is the potentially high cost of obtaining 33 
the required sUte information. It is also possible for an 
idle processor to acquire jobs from several processors 
and thus become overloaded. Ni and Xu [Ni 85] pro- 
pose the '*draft'* algorithm for a homogeneous system. 
Wah and Juang [Wah 85] propose a window control 60 
algorithm to schedule the resource in local computer 
systems with a multi-access network. Wang and Morris 
[Wang 85] studied ten different algorithms for homoge- 
neous systems to evaluate the performance differences. 
Eager, et al. [Eage 86] addressed the problem of decen- 63 
tralized load sharing in a multiple system using dynam- 
ic-sute information. Eager discussed the appropriate 
level of complexity for load-sharing policies and 



showed that schemes that use relatively simple state 
information do very well and perform quite closely to 
the optimal expected performance. The system configu- 
ration studied by Eager, et al. was also a homogeneous 
system. Towsley and Lee [Tows 86] used the threshold 
of the local job queue length at each host to make deci- 
sions for remote processing. This computer system was 
generalized to be a heterogeneous system. 

In summary, most of the work reported in the litera- 
ture has been limited to either sutic schemes, central- 
ized control, homogeneous systetns, or to two-proces- 
sor systems where overhead considerations were ig- 
nored. All of these approaches make assumptions that 
are too restricted to apply to most real computer system 
installations. The main contribution of this reported 
work is the development of a dynamic, decentralized, 
resource-sharing algorithm for a heterogeneous multi- 
ple (i.e. greater than two) processor system. Because it 
is server-initiated, this approach thus differs signifi- 
cantly from the sender-initiated approach described in 
[Tows 86]. The disadvantage of this prior art server- 
initiated approach is that it imposes extra overhead in 
the heavily-loaded situation and therefore, it could 
bring the system to an unstable state. 
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Proc. Tenth Int'l Conf. Parallel Processing, pp. capability. One can further distinguish the algorithms 

352-257, August 1981. employed by the type of node that takes the initiative in 

[Ni 85] Ni, L. M. "A Distributed Drafting Algorithm the resource-sharing. Algorithms can be either scnder- 

for Load Balancing," IEEE Trans, on Software Eng., initiated or receiver-initiated. Some algorithms can be 

Vol. SE-II, No. 10, October 1985. 5 adapted to a generalized heterogeneous system while 

[Silv 84] Silva, E. S. and Gerla, M. "Load Balancing others can be used only in a homogeneous system, 

in Distributed Systems with Multiple Classes and Site These categories are further addressed with respect to 

Constraints^^ Performance 84 (North Holland), pp. the above-referenced prior art patents as follows. 

^^"■33, 1984. Centralized/Decentralized: In a system with central- 

[Ston 78] Stone, H.S. "Critical Load Factors in Two- 10 i^ed control (as shown in FIG. 1) jobs arrive at the 

Processor Distributed Systems." IEEE, Trans, of Soft- central control computer 18 which is responsible for 

ware Engineering, VoL SE-*, No. 3, pp. 254-258, May distributing the jobs among the network nodes. In a 

decentralized system, jobs are submitted to the individ- 

[Tows 86] Towslcy, D. and Lee. K, J. "A Compari- ^ j^^^ ^nd the decision to transfer a job to another 

son of Priority-Based Decentralized Load Balancing 15 node is made locally. The central dispatcher approach is 

Policies," ACM Performance Evaluation Review, VoL ^^jte restrictive for a distributed system. In the tcach- 

14jNo. I. K>. 70-77, May 1986. ^ , ^ ^ , . ings of their patent, Kitajima, H. and Ohmachi assign a 

(Tows ] Towsley, p. and Lee. K. J. On the Analysis processing request allocator to be the single controUcr 

of a DoDcntrahzed L^ Sh^g PoUcy m Heteroge- centralized scheme. One of the problems with 

ncoia Distnbuted Systems.' Submitcd to IEEE, Trans. 20 ^^^ro^ch is the potentially high cost of obtaining 

V . A ».T J 1 TN 1 the required state information. It is also possible for an 

[Tant 85] Tantawi, A. N. and T^ovj^ley^ D. "Optima^ .^^^ "^^^ .^^^ ^^^^ ^^^^^ ^^^^^ 

jJ^^^^S^^ " neo^u^s"^^^^^^^^^^ 

S"1iL£"S In', SsT" ^ ^ heterogeneous sys- 

[Wah 85] WahT and Lien,, Y. N. "Design of Dis- f*^"^' ^°"?P^f . have the same process- 

tributed Ditabas; on Local Computer Systems with a "^^^^T' ^ n'p ^"^^ l ^^7^' 

Multi-Access Network". IEEE Trans. Software Engi- yo disclose a scheme to balance daU- 

neering. Vol. SE-U. No. 7, July 1985. pra:cssmg workloads on a hoinogeneous enviromnent. 

[Wah 85] Wah. B. and Yuang. J. Y. "Resource Sched- Sender/Receiver Initiated: If the source node makes 
uling for Local Computer Systems with a Multi-Access t detcrmmation as to where to route a job, this is de- 
Network." IEEE Trans, on Computers. Vol. C-34. No. ^"^^d as a sender-mitiated strategy. In receiver-imtiated 
12 December 1985. 35 strategies, the situation is reversed, i.e., lightly-loaded 

[Wang 85] Wang^ Y. T. and Morris, R. J. T. "Load ^^^^^ congested nodes from which work 

Sharing in Distributed Systems." IEEE Trans, on Com- ^^y be transferred. In their patent, Hoschler, H., Rai- 

puters, VoL C-34. pp. 204-217, March 1985. ™ar. W.. and Brandmaler disclose a sender-initiated 

The foregoing articles and reports from the Uteraturc scheme. The inventors herein have proved that the 

arc only generally relevant for background discussion 40 receiver-initiated approach is superior at medium to 

purposes and, since copies are not readily available to loads and. therefore, have incorporated such an 

applicants for filing herewith, they are not being pro- approach in their invention in a novel manner, 

vided. In addition to the foregoing non-supplied articles Static/Dynamic: Static schemes use only the infor- 

from the literature, however, copies of the following "nation about the long-term average behavior of the 

relevant U.S. Utters Patent are being provided here- 45 system, i.e. they ignore the current state. Dynamic 

y^Xh: schemes differ from the static schemes by determining 

[1] Hoschler. H.. Raimar. W.. and Bandmaler. K. ^ow and when to transfer jobs based on the timc- 

"Method of Operating a Data Processing System," U.S. dependent current system state instead of the average 

Pat No. 4,099,235, July 4, 1978. behavior. The major drawback of static algorithms is 

[2] Kitajima, H. and Ohmachi. K. "Processing Re- 50 that they do not respond to fluctuations of the work- 
quest Allocator for Assignment of Loads in a Distrib- load- Dynamic schemes attempt to correct this draw- 
uted Processing System," U.S. Pat. No. 4.495.570. Jan. back. 

^*rPl^' ^ XM n Txr^ jir^ »c STATEMENT OF THE INVENTION 
[3] Fry, S. M., Hcmpy. H. O.. and Kittmger. B. E. 

"Balancing Data-Processing Workloads", U.S. Pat. No. 55 Accordingly, an object of the invention is the provid- 

4.403.286. Sept 6» 1983. ing of a dynamic decentralized resource-sharing algo- 

With respect to the above-listed U.S. Patents and the rithm for a heterogeneous multi-processor system, 
teaching thereof vis-a-vis the present invention to be It is another object of the invention to provide a 
described hereinafter, the inventors herein have in- dynamic decentralized resource-sharing algorithm for a 
vented a new dynamic load-balancing scheme for a 60 heterogeneous multi-processor system which is receiv- 
distributed computer system consisting of a number of er-initiated in heavy load so that it does not impose 
heterogcneoiis hosts connected by a local area network extra overhead in the heavily-loaded situation and, 
(LAN). As mentioned above, numerous studies have therefore, will not bring the system to an unstable state, 
addressed the problem of resource-sharing in distrib- Another object of the present invention is to prevent 
uted systems. For purposes of discussion and distin- 65 an idle node in a heterogeneous multi-processor system 
guishing, it is convenient to classify these strategies as from becoming isolated from the resource-sharing pro- 
being either static or dynamic in nature and as having cess, as can happen with the systems of Fry and 
either a centralized or decentralized decision-making Kitajima by providing a wakeup timer used at each idle 
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node to periodically cause the idle node to search for a for short), will now be presented. A queueing analysis 
job that can be transferred from a heavily-loaded node. of the algorithm's performance is presented and the 

Still another object of the present invention is to use result is validated by the reporting of simulation results, 
the local queue length and the local service rate ratio at The basic environment is as depicted in FIG. 2; that is, 
each node as a more efficient workload indicator. 5 there are a plurality of computer nodes 12' distributed 

It is yet a further object of the invention to provide a across a network li without the need for a control 
dynamic decentralized resource-sharing algorithm for a computer 18 as in the prior art of FIG. 1. 
hcterogeseous multi-processor system which dynami- As observed earlier herein, most of the work of the 
cally adjusts to the traffic load and does not generate prior art has been limited to either static schemes, cen- 
extra overhead during high traffic loading conditions 10 tralized control, or homogeneous systems. All of these 
and, therefore, cannot bring the system to an unstable approaches make assumptions that are much too restric- 
state. tive to apply to most real computer system installations. 

The foregoing objects have been achieved in a dis- The main contribution of the present invention is the 
tributed heterogeneous computer system having a plu- providing of a dynamic, decentralized, resource-sharing 
rality of computer nodes each operatively connected 13 algorithm for a heterogeneous, multi-processor, com- 
through a network interface to a network to provide for puter system (such as that generally indicated as 10' in 
communications and transfers of data between the FIG. 2). The algorithm employed in the present inven- 
nodes and wherein the nodes each have a queue for tion uses a dual-mode, server-initiated approach which 
containing jobs to be performed, by the improvement of is clearly novel over anything Uught or suggested by 
the present invention for dynamically reallocating the 20 the prior art. Jobs are transferred from heavily bur- 
system's resources for optimized job performance. dened nodes (i.e. over a high threshold limit) to low 
There is fust logic at each node for dynamically and burdened (or idle) nodes at the initiation of the receiv- 
periodically calculating and saving a workload vsiue as ing node when (1) a job fmishes at a node which is 
a function of the number of jobs on the node's queue. burdened below a pre-established threshold level or (2) 
Second logic is provided at each node for transfering 25 a node has been idle for a period of time as established 
the node's workload value to other nodes on the net- by a wakeup timer at the node. The important advan- 
work at the request of the other nodes. FinaUy, there is tage of this approach is that, unlike the prior an ap- 
third logic at each node operable at the completion of proaches, it does not impose extra overhead in the 
each job. The third logic includes, logic for checking heavily-loaded situation. Therefore, it cannot bring the 
the node's own workload value, logic for polling all the 30 system to an unstable state. The algorithm also has two 
other nodes for their workload value if the checking important advantages over the prior art. First, to pre- 
node's workload value is below a pre-esublished value vent an idle node from becoming isolated from the 
indicating the node as being underutilized and available resource-sharing process, the wakeup timer is included. 
todomorejobs,logicfor-checkingthe workload values As will be addressed in greater detail shortly, the 
of the other nodes as received, and logic for transfering 3S wakeup timer is used at each idle node to periodically 
a job from the queue of the other of the nodes having cause the idle node to search for a job that can be trans- 
the highest workload value over a pre-established value ferred from a heavily-loaded node. Second, this invcn- 
indicating the other of the nodes as being overburdened tion uses a combination of the local queue length and 
and requiring job relief to the queue of the checking the local service rate ratio at each node as the workload 
node. The third logic is also operable periodically when 40 indicator. In a heterogeneous computer system, it is 
the node is idle. more efficient to use this workload indicator rather than 

Other objects and advantages of the present invention just the local queue length as employed in the prior art. 
will become apparent from the description which fol- It was determined by the inventors herein that an 
lows hereinafter when taken in conjunction with the ideal resource-sharing algorithm should have the fol- 
drawing figures which accompany it. 43 lowing characteristics: 

1) Dynamic: the load distribution should adapt to 
rapid system load changes. 

2) Decentralized: each processing computer node 
should determine, on its own, whether to process a job 

30 locally or to send the job to some other node for pro- 
cessing. There is no need for a central dispatcher. Since 
the central dispatcher is not required, the problem of a 
potential single point of failure is eliminated. 

3) Server-initiated: a good scheme should only re- 
33 quest job relocation when there are idle processors 

available to serve the relocated jobs. By using the serv- 
er-initiated approach, the danger of sender-initiated 
schemes, i.e. that all the nodes are overloaded and each 
attempts to give away jobs causing unproductive over- 
60 head which simply further saturates the already over- 
loaded system, is eliminated. 

The following is an outline of the basic SIDA algo- 
rithm in a computer language type of form: 
GIVEN— 

65 N= Number of total nodes in the heterogeneous system; 
H= Service rate of node i; 
Q= Queue length at node i. 
WHILE- 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a simplified block diagram of a prior art 
distributed computer system employing a control com- 
puter to distribute and redistribute the tasks among the 
computer nodes on the network. 

FIG. 2 is a simplified block diagram of a distributed 
computer system according to the present invention. 

FIG. 3 is a functional block diagram of a computer 
from the system of FIG. 2 pointing out the portions 
thereof related to the present invention. 

FIG. 4 is a functional block diagram of the system of 
FIG. 2 showing the is which tasks are dynamically 
reallocated accord dual mode approach of the present 
invention. 

FIG. 5 is a flowchart of the logic of the method of the 
present invention. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The new and novel resource-sharing algorithm of the 
present invention, which the inventors call the Server- 
Initiated Dynamic Resource-Sharing Algorithm (SIDA 
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{[(a job completes at node i) or wakeup-timeout] and 
[Q/H ^ LOW.THRESHOLD]} 
DO 

probe Q for k-1 to N. k=i; 

identify the node, j. with the MAX(Q ); 5 
if Q /H^HIGH -THRESHOLD then 
DO 

transfer 1 job from node j to node i; 

(*job is processed at node i and the results are retiirncd 

to node j*) 10 

END 

END. 

The environment of the present invention is shown in 
greater detail in FIGS. 3 and 4 while the dual mode 
algorithm logic is shown in flowchart form in FIG. 5. IS 
Each computer node 12' includes a task queue 22. Point- 
ers are provided to the top of queue (TOQ)i end of 
queue (EOQ), and bottom of queue (BOQ). The length 
of the active contents of the queue 22 at any time can be 
determined by subtracting TOQ from BOQ. The per- 20 
centage of the active contents of the queue 22 at any 
time is, of course, easily calculated as BOQ-TOQ- 
/EOQ-TOQ. As depicted in FIG. 3, the operating 
system 24, for example, can provide a service rate 26 for 
the node 12', i.e. the ratio (percentage) of computational 25 
usage of the node 12' compared with its potential. The 
SIDA as incorporated into the block labelled TASK 
ALLOCATION AND TRANSFER LOGIC 28, uses 
the length of the local queue 22 and the local service 
rate 26 at each node 12' as the workload indicator 30 for 30 
that node 12'. When a job finishes at a node 12', the 
logic 28 of the node 12' checks the workload indicator 
30 obtained by dividing the queue length by the service 
rate. If the workload indicator 30 is less than a certain 
low threshold level, the lightly-loaded node 12' initiates 35 
a search for the most busy node 12'. 

As depicted in FIG. 4, in the system 10' of the present 
invention, the task allocation and transfer logic 28 of 
each node 12' is connected to the network 14 through a 
network interface 32. Thus, as configured, the task 40 
allocation and transfer logic 28 of each node 12' can 
access the last calculated workload indicator 30 of all 
the other nodes 12' on the network 14. To accomplish a 
search for the most busy node 12', the task allocation 
and transfer logic 28 of a node 12' simply requests the 45 
workload indicators 30 of all the other nodes 12' on the 
network 14. If the workload indicator 30 of the most 
busy node 12' is above a certain high threshold, a job 
from the task queue 22 of that busy node 12' is trans- 
ferred to the lightly-loaded node 12'. To prevent an idle SO 
node 12' from becoming isolated from the resource- 
sharing process, the wakeup timer 34 is used at each idle 
node 12' to periodically cause the idle node 12' to search 
for a job which can be transferred from a heavily loaded 
node 12'. Thus, SIDA provides a method which bal- 55 
ances the workload among the nodes 12', resulting in a 
beneficial and substantial improvement in the response 
time and throughput performance of the total system. It 
should be noted that SIDA adjusts dynamically to the 
traffic load Qf each node 12'. When the workload indi- 60 
cator of every node 12' is greater than a certain thresh- 
old level, the algorithm generates no overhead— an 
important and novel advantage over the prior art. 

In the prior art, such as reported in Wah's research 
[Wah 85], the priority level of the load-balancing pro- 65 
cess is assumed to be lower than that of regular jobs, 
thereby preventing the resource-sharing procedure 
from inhibiting normal operation. In direct contrast to 



,089 

10 

this prior art assumption and teaching, however, for 
SIDA it is necessary and preferred to assign the highest 
priority level to the resource-sharing process. This is 
because, otherwise, the lightly-loaded nodes 12' could 
not receive the workload indicators 30 from the other 
nodes 12' upon which to base a decision and jobs would 
never be transferred from the heavy-loaded nodes 12' in 
a timely manner, i.e. before the heavily-loaded nodes 12' 
become lightly-loaded. 

VALIDITY TESTING RESULTS 

The present invention and its novel algorithm were 
verified by simulation modeling. A multiple processor 
heterogeneous system was considered in which the 
service rates of the nodes are not necessarily identical. 
Each node was modeled as a queueing center. For a 
particular station m, new jobs were assumed to arrive at 
rate rm. The average service rate was Sm and the inter- 
wakeup time, which was th? same for all nodes, was 
\/Xw The state of the system was defined to be the num- 
ber of jobs in a node, either in the queue or being served. 
The objective of the performance analysis was to deter- 
mine the effects of resource-sharing according to the 
method of the present invention on the average system 
response time. These effects are a function of the traffic 
intensity, which is defmed as the ratio of the job arrival 
rate to the job service rate. 

The basic assumptions made in this performance 
study were as follows: 

a) There is only one class of tasks. The task arrival 
rate to each processor is exponentially distributed. The 
arrival rates at each processor may be different. 

b) The resource service rates are exponentially dis- 
tributed and may be different for each resource proces- 
sor. 

c) The average inter-wakeup rate Xw for each idle 
node is the same and is exponentially distributed. 

d) Each job needs only one resource. 

e) The network transmission delay in propagating 
requests, probing the status of other nodes, and return- 
ing results is negligible. This assumption is valid if the 
transmission bandwidth is large compared to the traffic 
on the network. 

0 All processing overhead for probing, packing and 
unpacking data, request and result transfer are ignored. 
This assumption is valid if the processing required to 
pack and unpack the job is significantly less than the 
processing required to process the job. 

g) For simplicity, only the queue length was used as 
the workload indicator. 

h) Lightly-loaded nodes polled jobs from any heavi- 
ly-loaded node rather than selecting only the most 
heavily*loaded one. 

i) The low threshold is assumed to be 0. In other 
words, a node tries to poU a job from other nodes while 
it is idle. 

j) The high threshold is assumed to be 2. 

Exact analysis of the algorithm was quite complex 
since one is faced with an N-dimensional Markov chain. 
The inventors employed a simplified approximate 
model which they proved to their satisfaction to be 
quite accurate in performance prediction. The approach 
characterized the iteration between the various queues 
in terms of their steady-state occupancy probabilities. 
Iteration was then used to update estimates of the inter- 
action. To account for the load from other nodes, the 
inventors divided by the expected number of nodes that 
have more than one job. The findings were as follows. 
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When the last job is completed and the node is about 
to leave state 1, node m always probes all other nodes. 
There are three detailed procedures that take place 
during this state transition: 

1) Tlie last local job finishes. 

2) Node m probes all the other nodes to check 
whether any other node has more than one job. Note, 
there is a probability that at least one of the remote 
nodes has one or more jobs waiting in queue for service. 

3) If a remote node has more than one job» the algo- 
rithm transfen one job from the remote node to the 
local node and the node remains in state 1. There is a 
probability that node m cannot find an overloaded node 
and therefore node m transfers to state 0 with the ser- 
vice rate %m- 

While in an idle sute, node m 'Svakes up" frequently 
at the average wakeiip rate r^.. In all cases, after 
wakeup, this idle node should be able to poll a job from 
a remote node with the exception of the case where all 
other nodes are either idle or only have one job. 

The inventors saw that the transition rates, and hence 
the steady state solution, for node m depended on the 
steady sUte probabilities of the other queues in the 
system. Iteration was used to produce a, hopefully, 
close approximate result. From the state-transition dia- 
gram, the inventors calculated the steady-state probabil- 
ities, and found that the results verified the expected 
performance of the algorithm embodied in the present 
invention. 

We claim: 

1. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, the improve- 
ment for dynamically reallocating the system's re- 
sources for optimized job performance comprising: 

a) means at each node for dynamically and periodi- 
cally calculating and saving a workload value as a 40 
function of the number of jobs on the node's queue; 

b) means at each node for transfering the node's said 
workload value to other nodes on the network at 
the request of said other nodes; and, 

c) means at each node operable at the completion of 45 
each job, 

cl) for checking the node's own said workload 
value, 

c2) for polling all the other nodes for their said 
workload value if the checking node's said work- 
load value is below a pre-established value indi- 
cating the node as being underutilized and avail- 
able to do more jobs, 

c3) for checking the said workload values of the 
other nodes as received, and 

c4) for transfering a job from the queue of the other 
of the nodes having the highest said workload 
value over a pre-esUblished value indicating said 
other of the nodes as being overburdened and 
requiring job relief to the queue of the checking 60 
node. 

2. The improvement to a distributed heterogeneous 
computer system of claim 1 and additionally comprising 
means at each node periodically operable when the 
node is idle: 

a) for checking the node's own said workload value; 

b) for polling all the other nodes for their said work- 
load value if the checking node's said workload 
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value is below a pre-established value indicating 
the node as being underutilized and available to do 
more jobs; 

c) for checking the said workload values of the other 
nodes as received; and, 

d) for transfering a job from the queue of the other of 
the nodes having the highest said workload value 
over a pre-established value indicating said other of 
the nodes as being overburdened and requiring job 
relief to the queue of the checking node. 

3. The improvement to a distributed heterogeneous 
computer system of claim 1 wherein: 

said means at each node for dynamically and periodi- 
cally calculating and saving a workload value u a 
function of the number of jobs on the node's queue 
comprises means for dividing the number of jobs 
on the node's queue by the service rate of the node. 

4. The improvement to a distributed heterogeneous 
computer system of claim 1 wherein: 

a) the jobs to be performed by the nodes are assigned 
priority levels; and, 

b) said polling of all the other nodes for their said 
workload value by a node is accomplished by the 
node as a job at the highest priority level. 

5. A distributed heterogeneous computer system hav- 
ing dynamic resource allocation comprising: 

a) network means for providing a communications 
path for computer; 

b) a plurality of computer nodes each operatively 
connected through a network interface to said 
network means whereby to communicate and 
transfer data between said nodes, said nodes each 
having a queue for containing jobs to be per- 
formed; 

c) means at each said node for dynamically and peri- 
odically calculating and saving a workload value as 
a function of the number of jobs on said node's 
queue; 

d) means at each node for transfering said node's said 
workload value to other nodes over said network 
at the request of said other nodes; and, 

e) means at each node operable at the completion of 
each job, 

el) for checking said node's own said workload 
value, 

e2) for polling all the other said nodes for their said 
worldoad value if the checking node's said work- 
load value is below a pre-established value indi- 
cating said node as being underutilized and avail- 
able to do more jobs, 

e3) for checking the said workload values of the 
other said nodes as received, and 

e4) for transfering a job from said queue of the 
other of said nodes having the highest said , work- 
load value over a pre-esublished value indicat- 
ing said other of said nodes as being overbur- 
dened and requiring job relief to said queue of 
the checking node. 

6. The distributed heterogeneous computer system of 
claim 5 and additionally comprising means at each said 
node periodically operable when said node is idle: 

a) for checking said node's own said workload value; 

b) for polling all the other said nodes for their said 
workload value if the checking node's said work- 
load value is below a pre-established value indicat- 
ing said node as being underutilized and available 
to do more jobs; 
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c) for checking the said workload values of the other 
said nodes as received; and, 

d) for transfering a job from said queue of the other of 
said nodes having the highest said workload value 
over a pre-established value indicating said other of 
said nodes as being overburdened and requiring job 
relief to said queue of the checking node. 

7. The distributed heterogeneous computer system of 
claim 5 wherein: 

said means at each node for dynamically and periodi- 
cally calculating and saving a workload value as a 
function of the number of jobs on said node's qtieue 
comprises means for dividing the number of jobs 
on said node's queue by the service rate of said 
node. 

8. The improvement to a distributed heterogeneous 
computer system of claim 5 wherein: 

a) the jobs to be performed by said nodes are assigned 
priority levels; and, 

b) said polling of all the other nodes for their said 
workload value by a node is accomplished by said 
node as a job at the highest priority level. 

9. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, the method 
of operation for dynamically reallocating the system's 
resources for optimized job performance comprising 
the steps of: 

a) at each node, dynamically and periodically calcu- 
lating and saving a workload value as a function of 3^ 
the number of jobs on the node's queue; 

b) transfering the node's workload value to other 
nodes on the network at the request of the other 
nodes; and, 

c) at each node at the completion of each job, 40 
cl) checking the node's own workload value. 

c2) polling all the other nodes for their workload 
value if the checking node's workload value is 
below a pre-established value indicating the 
node as being underutilized and available to do 
more jobs, 

c3) checking the workload values of the other 

nodes as received, and 
c4) transfering a job from the queue of the other of 
the nodes having the highest the workload value 
over a pre-established value indicating the other 
of the nodes as being overburdened and requir- 
ing job relief to the queue of the checking node. 
10. The method of operating a distributed heteroge- 
neous computer system of claim 9 and when the node is 
idle additionally comprising the steps of: 
checking the node's own workload value; 
b) polling all the other nodes for their workload value 
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nodes as being overburdened and requiring job 
relief to the queue of the checking node. 

11. The method of operating a distributed heteroge- 
neous computer system of claim 9 wherein said step of 
dynamically and periodically calculating and saving a 
workload value as a function of the number of jobs on 
the node's queue comprises the step of: 

dividing the number of jobs on the node's queue by 
the service rate of the node. 

12. The method of operating a distributed heteroge- 
neous computer system of claim 9 wherein the jobs to 
be performed by the nodes are assigned priority levels 
and: 

said step of polling of all the other nodes for their 
workload value by a node is accomplished by the 
node as a job at the highest priority level. 

13. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, the improve- 
ment for dynamically reallocating the system's re- 
sources for optimized job performance comprising: 

a) Hrst logic means at each node for dynamically and 
periodically calculating and saving a workload 
value as a function of the number of jobs on the 
node's queue; 

b) second logic means at each node for transfering the 
node's said workload value to other nodes on the 
network at the request of said other nodes; and, 

c) third logic means at each node operable at the 
completion of each job, said third logic means 
including, 

cl) means for checking the node's own said work- 
load value, 

c2) means for polling all the other nodes for their 
said workload value if the checking node's said 
workload value is below a pre-established value 
indicating the node as being underutilized and 
available to do more jobs, 

c3) means for checking the said workload values of 
the other nodes as received, and 

c4) means for transfering a job from the queue of 
the other of the nodes having the highest said 
workload value over a pre-established value 
indicating said other of the nodes as being over- 
burdened and requiring job relief to the queue of 
the checking node. 

14. The improvement to a distributed heterogeneous 
computer system of claim 13 wherein: 

said third logic means is also operable periodically 
when the node is idle. 

15. The improvement to a distributed heterogeneous 
computer system of claim 13 wherein: 

said fu^t logic means at each node comprises means 
for dividing the number of jobs on the node's queue 
by the service rate of the node. 

16. The improvement to a distributed heterogeneous 
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a pre-established value indicating the node as being 
underutilized and available to do more jobs; 

c) checking the workload values of the other nodes as 
received; and, 

d) transfering a job from the queue of the other of the 
nodes having the highest workload value over a 
pre-established value indicating the other of the 
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performed by the nodes are assigned priority levels and 
wherein additionally: 
said means for polling of all the other nodes for their 
said workload value by a node of said third logic 
means includes means for accomplishing said pol- 
ling as a job at the highest priority level. 
• • * * * 
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